observation bias
Appendix
A Extended Related Work There are two groups of approaches to debias click data for ranking. The first is to model user's behavior to infer relevance from biased click signals, known as click models [16, 21, 20, 9, 8]. However, most click models focus on predicting clicks, rather than optimizing the ranking performance [29, 4]. They are usually separated from the L TR frameworks, and the relevance inference is an afterthought [4]. The second group tries to directly learn unbiased ranking models from biased clicks, known as unbiased learning to rank (UL TR).
Error Analysis of Shapley Value-Based Model Explanations: An Informative Perspective
Zhao, Ningsheng, Yu, Jia Yuan, Dzieciolowski, Krzysztof, Bui, Trang
Shapley value attribution (SVA) is an increasingly popular explainable AI (XAI) method, which quantifies the contribution of each feature to the model's output. However, recent work has shown that most existing methods to implement SVAs have some drawbacks, resulting in biased or unreliable explanations that fail to correctly capture the true intrinsic relationships between features and model outputs. Moreover, the mechanism and consequences of these drawbacks have not been discussed systematically. In this paper, we propose a novel error theoretical analysis framework, in which the explanation errors of SVAs are decomposed into two components: observation bias and structural bias. We further clarify the underlying causes of these two biases and demonstrate that there is a trade-off between them. Based on this error analysis framework, we develop two novel concepts: over-informative and underinformative explanations. We demonstrate how these concepts can be effectively used to understand potential errors of existing SVA methods. In particular, for the widely deployed assumption-based SVAs, we find that they can easily be under-informative due to the distribution drift caused by distributional assumptions. We propose a measurement tool to quantify such a distribution drift. Finally, our experiments illustrate how different existing SVA methods can be over- or under-informative. Our work sheds light on how errors incur in the estimation of SVAs and encourages new less error-prone methods.
Mitigating Observation Biases in Crowdsourced Label Aggregation
Ueda, Ryosuke, Takeuchi, Koh, Kashima, Hisashi
Crowdsourcing has been widely used to efficiently obtain labeled datasets for supervised learning from large numbers of human resources at low cost. However, one of the technical challenges in obtaining high-quality results from crowdsourcing is dealing with the variability and bias caused by the fact that it is humans execute the work, and various studies have addressed this issue to improve the quality by integrating redundantly collected responses. In this study, we focus on the observation bias in crowdsourcing. Variations in the frequency of worker responses and the complexity of tasks occur, which may affect the aggregation results when they are correlated with the quality of the responses. We also propose statistical aggregation methods for crowdsourcing responses that are combined with an observational data bias removal method used in causal inference. Through experiments using both synthetic and real datasets with/without artificially injected spam and colluding workers, we verify that the proposed method improves the aggregation accuracy in the presence of strong observation biases and robustness to both spam and colluding workers.
FairRoad: Achieving Fairness for Recommender Systems with Optimized Antidote Data
Fang, Minghong, Liu, Jia, Momma, Michinari, Sun, Yi
Today, recommender systems have played an increasingly important role in shaping our experiences of digital environments and social interactions. However, as recommender systems become ubiquitous in our society, recent years have also witnessed significant fairness concerns for recommender systems. Specifically, studies have shown that recommender systems may inherit or even amplify biases from historical data, and as a result, provide unfair recommendations. To address fairness risks in recommender systems, most of the previous approaches to date are focused on modifying either the existing training data samples or the deployed recommender algorithms, but unfortunately with limited degrees of success. In this paper, we propose a new approach called fair recommendation with optimized antidote data (FairRoad), which aims to improve the fairness performances of recommender systems through the construction of a small and carefully crafted antidote dataset. Toward this end, we formulate our antidote data generation task as a mathematical optimization problem, which minimizes the unfairness of the targeted recommender systems while not disrupting the deployed recommendation algorithms. Extensive experiments show that our proposed antidote data generation algorithm significantly improve the fairness of recommender systems with a small amounts of antidote data.
Equal Experience in Recommender Systems
Cho, Jaewoong, Choi, Moonseok, Suh, Changho
We explore the fairness issue that arises in recommender systems. Biased data due to inherent stereotypes of particular groups (e.g., male students' average rating on mathematics is often higher than that on humanities, and vice versa for females) may yield a limited scope of suggested items to a certain group of users. Our main contribution lies in the introduction of a novel fairness notion (that we call equal experience), which can serve to regulate such unfairness in the presence of biased data. The notion captures the degree of the equal experience of item recommendations across distinct groups. We propose an optimization framework that incorporates the fairness notion as a regularization term, as well as introduce computationally-efficient algorithms that solve the optimization. Experiments on synthetic and benchmark real datasets demonstrate that the proposed framework can indeed mitigate such unfairness while exhibiting a minor degradation of recommendation accuracy.
Recognition Networks for Approximate Inference in BN20 Networks
We propose using recognition networks for approximate inference inBayesian networks (BNs). A recognition network is a multilayerperception (MLP) trained to predict posterior marginals given observedevidence in a particular BN. The input to the MLP is a vector of thestates of the evidential nodes. The activity of an output unit isinterpreted as a prediction of the posterior marginal of thecorresponding variable. The MLP is trained using samples generated fromthe corresponding BN.We evaluate a recognition network that was trained to do inference ina large Bayesian network, similar in structure and complexity to theQuick Medical Reference, Decision Theoretic (QMR-DT). Our networkis a binary, two-layer, noisy-OR network containing over 4000 potentially observable nodes and over 600 unobservable, hidden nodes. Inreal medical diagnosis, most observables are unavailable, and there isa complex and unknown bias that selects which ones are provided. Weincorporate a very basic type of selection bias in our network: a knownpreference that available observables are positive rather than negative.Even this simple bias has a significant effect on the posterior. We compare the performance of our recognition network tostate-of-the-art approximate inference algorithms on a large set oftest cases. In order to evaluate the effect of our simplistic modelof the selection bias, we evaluate algorithms using a variety ofincorrectly modeled observation biases. Recognition networks performwell using both correct and incorrect observation biases.